Comments on 'Parallel Algorithms for Hierarchical Clustering and Cluster Validity'

نویسنده

  • Fionn Murtagh
چکیده

the new sample, that is Thus, in general, our learning algorithm may require more than rlog,(IXI-1)1 + 1 samples to solve the problem in the worst case. We have analyzed the problem of constructing a linear classifier for a finite set X of linearly separable vectors by partially supervised leaming. The proposed learning algorithm consists of two major operations: sample selection and classifier construction. In both operations, only the classification information about the sample set is used. The primary goal of the leaming system is to identify the membership of those vectors outside the sample set. The key issue in the design of a partially supervised learning algorithm is sample selection. The following factors should be considered in the evaluation of a sample selection algorithm: 1) the size of the sample set, 2) the classification error, and 3) the computational complexity. We have shown that the sample set selected by our algorithm is minimal for R' and R2 in the worst case, and the classifier derived from such a sample set produces no classification errors. The main disadvantage of our approach is the high computational cost as it requires the solution of O(1XI3) linear programming problems in the worst case. In this study, we have only demonstrated how linearly separable problems can be solved by partially supervised learning. It would be useful to extend our framework to deal with more complicated systems. Query formulation in linear retrieval models, " J. Abstract-The purpose of this correspondence is to indicate that state-of-the-art hierarchical clustering algorithms have 0 n time complexity and should be referred to in preference to the O(n) algorithms, which were described in many texts in the 1970's. We also point out some further references in the parallelizing of hierarchic clustering algorithms. Li [6] describes parallel implementations of hierarchical clustering algorithms that achieve O(n 2) computational time complexity and thereby improve on the baseline of sequential implementations. The latter are stated to be O(n3), with the exception of the single link method. It is inappropriate to use as one's baseline implementations, which could only be described as 1970's vintage. Surely, it should have been noted that O (n 2) time implementations exist for most of the widely known hierarchical clustering methods. Average time implementations that come close to O (n) are also known. Rohlf [13] discusses an O (n log log n) expected time algorithm for the minimal spanning tree, …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Algorithms for Hierarchical Clustering and Cluster Validity

This correspondence proposes parallel algorithms on SIMD machines for hierarchical clustering and cluster validity computation. The machine model uses a parallel memory system and an alignment network to facilitate parallel access of both pattern matrix and proximity matrix. For a problem with N patterns, the number of memory accesses is reduced from 0 ( N 3 ) on a sequential machine to 0 ( N 2...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

Design and Implementation of K-Means and Hierarchical Document Clustering on Hadoop

Document clustering is one of the important areas in data mining. Hadoop is being used by the Yahoo, Google, Face book and Twitter business companies for implementing real time applications. Email, social media blog, movie review comments, books are used for document clustering. This paper focuses on the document clustering using Hadoop. Hadoop is the new technology used for parallel computing ...

متن کامل

MLCA: A Multi-Level Clustering Algorithm for Routing in Wireless Sensor Networks

Energy constraint is the biggest challenge in wireless sensor networks because the power supply of each sensor node is a battery that is not rechargeable or replaceable due to the applications of these networks. One of the successful methods for saving energy in these networks is clustering. It has caused that cluster-based routing algorithms are successful routing algorithm for these networks....

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 14  شماره 

صفحات  -

تاریخ انتشار 1992